Large Scale Data Mining : The Challenges andThe
نویسندگان
چکیده
Data mining over large data sets is considered to be a very important research subject due to its obvious commercial potential. However, it is also a major challenge due to its complexity and computational intensity. Exploiting the inherent parallelism of data mining algorithms provides a direct solution by utilising the large data retrieval and processing power of parallel architectures. In this paper, we classify various data mining algorithms with respect to their most eeective parallel structure. We study induction based classiication algorithms, neural networks, clustering algorithms and genetic algorithms. This classiication is based on our intensive research on the parallelisation of data mining algorithms. We also present a methodology for determining the proper parallelisation strategy based on the idea of algorithmic skeletons and performance modelling. This research aims to provide a systematic way to develop parallel data mining algorithms and applications.
منابع مشابه
Parallel and Distributed Data Mining: An Introduction
The explosive growth in data collection in business and scientific fields has literally forced upon us the need to analyze and mine useful knowledge from it. Data mining refers to the entire process of extracting useful and novel patterns/models from large datasets. Due to the huge size of data and amount of computation involved in data mining, high-performance computing is an essential compone...
متن کاملChapter 9 MINING TEXT STREAMS
The large amount of text data which are continuously produced over time in a variety of large scale applications such as social networks results in massive streams of data. Typically massive text streams are created by very large scale interactions of individuals, or by structured creations of particular kinds of content by dedicated organizations. An example in the latter category would be the...
متن کاملAutomatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining
Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...
متن کاملA Geometric View of Similarity Measures in Data Mining
The main objective of data mining is to acquire information from a set of data for prospect applications using a measure. The concerning issue is that one often has to deal with large scale data. Several dimensionality reduction techniques like various feature extraction methods have been developed to resolve the issue. However, the geometric view of the applied measure, as an additional consid...
متن کاملTowards Web Search Engine Scale Data Mining
Data mining is one of the most critical driving technologies behind Web search engines. Web search engine scale data mining posts many grand challenges, ranging from efficiency and scalability to diversity and adaptability. In this talk, I will review our recent effort on mining a very large amount of data accumulated in one of the major commercial search engines. Particularly, we tackle the pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997